25 research outputs found

    Explainable deep learning models for biological sequence classification

    Get PDF
    Biological sequences - DNA, RNA and proteins - orchestrate the behavior of all living cells and trying to understand the mechanisms that govern and regulate the interactions among these molecules has motivated biological research for many years. The introduction of experimental protocols that analyze such interactions on a genome- or transcriptome-wide scale has also established the usage of machine learning in our field to make sense of the vast amounts of generated data. Recently, deep learning, a branch of machine learning based on artificial neural networks, and especially convolutional neural networks (CNNs) were shown to deliver promising results for predictive tasks and automated feature extraction. However, the resulting models are often very complex and thus make model application and interpretation hard, but the possibility to interpret which features a model has learned from the data is crucial to understand and to explain new biological mechanisms. This work therefore presents pysster, our open source software library that enables researchers to more easily train, apply and interpret CNNs on biological sequence data. We evaluate and implement different feature interpretation and visualization strategies and show that the flexibility of CNNs allows for the integration of additional data beyond pure sequences to improve the biological feature interpretability. We demonstrate this by building, among others, predictive models for transcription factor and RNA-binding protein binding sites and by supplementing these models with structural information in the form of DNA shape and RNA secondary structure. Features learned by models are then visualized as sequence and structure motifs together with information about motif locations and motif co-occurrence. By further analyzing an artificial data set containing implanted motifs we also illustrate how the hierarchical feature extraction process in a multi-layer deep neural network operates. Finally, we present a larger biological application by predicting RNA-binding of proteins for transcripts for which experimental protein-RNA interaction data is not yet available. Here, the comprehensive interpretation options of CNNs made us aware of potential technical bias in the experimental eCLIP data (enhanced crosslinking and immunoprecipitation) that were used as a basis for the models. This allowed for subsequent tuning of the models and data to get more meaningful predictions in practice

    A comparative study of machine learning methods for time-to-event survival data for radiomics risk modelling

    Get PDF
    Radiomics applies machine learning algorithms to quantitative imaging data to characterise the tumour phenotype and predict clinical outcome. For the development of radiomics risk models, a variety of different algorithms is available and it is not clear which one gives optimal results. Therefore, we assessed the performance of 11 machine learning algorithms combined with 12 feature selection methods by the concordance index (C-Index), to predict loco- regional tumour control (LRC) and overall survival for patients with head and neck squamous cell carcinoma. The considered algorithms are able to deal with continuous time-to-event survival data. Feature selection and model building were performed on a multicentre cohort (213 patients) and validated using an independent cohort (80 patients). We found several combinations of machine learning algorithms and feature selection methods which achieve similar results, e.g., MSR-RF: C-Index = 0.71 and BT-COX: C-Index = 0.70 in combination with Spearman feature selection. Using the best performing models, patients were stratified into groups of low and high risk of recurrence. Significant differences in LRC were obtained between both groups on the validation cohort. Based on the presented analysis, we identified a subset of algorithms which should be considered in future radiomics studies to develop stable and clinically relevant predictive models for time-to-event endpoints

    Intensity modulated radiotherapy for high risk prostate cancer based on sentinel node SPECT imaging for target volume definition

    Get PDF
    BACKGROUND: The RTOG 94-13 trial has provided evidence that patients with high risk prostate cancer benefit from an additional radiotherapy to the pelvic nodes combined with concomitant hormonal ablation. Since lymphatic drainage of the prostate is highly variable, the optimal target volume definition for the pelvic lymph nodes is problematic. To overcome this limitation, we tested the feasibility of an intensity modulated radiation therapy (IMRT) protocol, taking under consideration the individual pelvic sentinel node drainage pattern by SPECT functional imaging. METHODS: Patients with high risk prostate cancer were included. Sentinel nodes (SN) were localised 1.5–3 hours after injection of 250 MBq (99m)Tc-Nanocoll using a double-headed gamma camera with an integrated X-Ray device. All sentinel node localisations were included into the pelvic clinical target volume (CTV). Dose prescriptions were 50.4 Gy (5 × 1.8 Gy / week) to the pelvis and 70.0 Gy (5 × 2.0 Gy / week) to the prostate including the base of seminal vesicles or whole seminal vesicles. Patients were treated with IMRT. Furthermore a theoretical comparison between IMRT and a three-dimensional conformal technique was performed. RESULTS: Since 08/2003 6 patients were treated with this protocol. All patients had detectable sentinel lymph nodes (total 29). 4 of 6 patients showed sentinel node localisations (total 10), that would not have been treated adequately with CT-based planning ('geographical miss') only. The most common localisation for a probable geographical miss was the perirectal area. The comparison between dose-volume-histograms of IMRT- and conventional CT-planning demonstrated clear superiority of IMRT when all sentinel lymph nodes were included. IMRT allowed a significantly better sparing of normal tissue and reduced volumes of small bowel, large bowel and rectum irradiated with critical doses. No gastrointestinal or genitourinary acute toxicity Grade 3 or 4 (RTOG) occurred. CONCLUSION: IMRT based on sentinel lymph node identification is feasible and reduces the probability of a geographical miss. Furthermore, IMRT allows a pronounced sparing of normal tissue irradiation. Thus, the chosen approach will help to increase the curative potential of radiotherapy in high risk prostate cancer patients

    Predictive modeling of long non-coding RNA chromatin (dis-)association

    No full text
    Long non-coding RNAs (lncRNAs) are involved in gene expression regulation in cis and trans. Although enriched in the chromatin cell fraction, to what degree this defines their broad range of functions remains unclear. In addition, the factors that contribute to lncRNA chromatin tethering, as well as the molecular basis of efficient lncRNA chromatin dissociation and its functional impact on enhancer activity and target gene expression, remain to be resolved. Here, we combine pulse-chase metabolic labeling of nascent RNA with chromatin fractionation and transient transcriptome sequencing to follow nascent RNA transcripts from their co-transcriptional state to their release into the nucleoplasm. By incorporating functional and physical characteristics in machine learning models, we find that parameters like co-transcriptional splicing contributes to efficient lncRNA chromatin dissociation. Intriguingly, lncRNAs transcribed from enhancer-like regions display reduced chromatin retention, suggesting that, in addition to splicing, lncRNA chromatin dissociation may contribute to enhancer activity and target gene expression

    Generic accelerated sequence alignment in SeqAn using vectorization and multi-threading

    No full text
    Motivation Pairwise sequence alignment is undoubtedly a central tool in many bioinformatics analyses. In this paper, we present a generically accelerated module for pairwise sequence lignments applicable for a broad range of applications. In our module, we unified the standard dynamic programming kernel used for pairwise sequence alignments and extended it with a generalized inter-sequence vectorization layout, such that many alignments can be computed simultaneously by exploiting SIMD (Single Instruction Multiple Data) instructions of modern processors. We then extended the module by adding two layers of thread-level parallelization, where we a) distribute many independent alignments on multiple threads and b) inherently parallelize a single alignment computation using a work stealing approach producing a dynamic wavefront progressing along the minor diagonal. Results We evaluated our alignment vectorization and parallelization on different processors, including the newest Intel® Xeon® (Skylake) and Intel® Xeon Phi™ (KNL) processors, and use cases. The instruction set AVX512-BW (Byte and Word), available on Skylake processors, can genuinely improve the performance of vectorized alignments. We could run single alignments 1600 times faster on the Xeon Phi™ and 1400 times faster on the Xeon® than executing them with our previous sequential alignment module. Availability The module is programmed in C++ using the SeqAn (Reinert et al., 2017) library and distributed with version 2.4. under the BSD license. We support SSE4, AVX2, AVX512 instructions and included UME::SIMD, a SIMD-instruction wrapper library, to extend our module for further instruction sets. We thoroughly test all alignment components with all major C++ compilers on various platforms

    Advanced Data Processing – The Cornerstone of Efficient GCxGC/TOF MS Method Development

    Get PDF
    Purpose: The combination of ionizing radiation with the pro-apoptotic TRAIL receptor antibody lexatumumab has been shown to exert considerable synergistic apoptotic effects in vitro and in short term growth delay assays. To clarify the relevance of these effects on local tumour control long-term experiments using a colorectal xenograft model were conducted. Materials and methods: Colo205-xenograft bearing NMRI (nu/nu) nude mice were treated with fractionated irradiation (5 x 3 Gy, d1-5) and lexatumumab (0.75 mg/kg, d1, 4 and 8). The tumour bearing hind limbs were irradiated with graded single top up doses at d8 under normoxic (ambient) and acute hypoxic (clamped) conditions. Experimental animals were observed for 270 days. Growth delay and local tumour control were end points of the study. Statistical analysis of the experiments included evaluation of tumour regrowth and local tumour control. Results: Combined treatment with irradiation and lexatumumab led to a pronounced tumour regrowth-delay when compared to irradiation alone. The here presented long-term experiments revealed a highly significant rise of local tumour control for normoxic (ambient) (p = 0.000006) and hypoxic treatment (p = 0.000030). Conclusion: Our data show that a combination of the pro-apoptotic antibody lexatumumab with irradiation reduces tumour regrowth and leads to a highly increased local tumour control in a nude mouse model. This substantial effect was observed under ambient and more pronounced under hypoxic conditions

    Evaluation of prognostic factors and role of participation in a randomized trial or a prospective registry in pediatric and adolescent nonmetastatic medulloblastoma: a report from the HIT 2000 trial

    Get PDF
    Purpose We aimed to compare treatment results in and outside of a randomized trial and to confirm factors influencing outcome in a large retrospective cohort of nonmetastatic medulloblastoma treated in Austria, Switzerland and Germany. Methods and Materials Patients with nonmetastatic medulloblastoma (n = 382) aged 4 to 21 years and primary neurosurgical resection between 2001 and 2011 were assessed. Between 2001 and 2006, 176 of these patients (46.1%) were included in the randomized HIT SIOP PNET 4 trial. From 2001 to 2011 an additional 206 patients were registered to the HIT 2000 study center and underwent the identical central review program. Three different radiation therapy protocols were applied. Genetically defined tumor entity (former molecular subgroup) was available for 157 patients. Results Median follow-up time was 7.3 (range, 0.09-13.86) years. There was no difference between HIT SIOP PNET 4 trial patients and observational patients outside the randomized trial, with 7 years progression-free survival rates (PFS) of 79.5% ± 3.1% versus 78.7% ± 3.1% (P = .62). On univariate analysis, the time interval between surgery and irradiation (≤ 48 days vs ≥ 49 days) showed a strong trend to affect PFS (80.4% ± 2.2% vs 64.6% ± 9.1%; P = .052). Furthermore, histologically and genetically defined tumor entities and the extent of postoperative residual tumor influenced PFS. On multivariate analyses, a genetically defined tumor entity wingless-related integration site-activated vs non-wingless-related integration site/non-SHH, group 3 hazard ratio, 5.49; P = .014) and time interval between surgery and irradiation (hazard ratio, 2.2; P = .018) were confirmed as independent risk factors. Conclusions Using a centralized review program and risk-stratified therapy for all patients registered to the study center, outcome was identical for patients with nonmetastatic medulloblastoma treated on and off the randomized HIT SIOP PNET 4 trial. The prognostic values of prolonged time to RT and genetically defined tumor entity were confirmed

    Assessment and safety of operation of oil and gas pipelines in non-steady conditions of technological parameters

    No full text
    Актуальность. Изменения технологических параметров перекачки продукта в процессе эксплуатации нефтегазопроводов по сравнению со стационарными условиями работы приводят к возникновению дополнительных механических напряжений в стенке труб и к снижению запасов прочности. При этом заданный в стадии проектирования ресурс трубопроводов изменяется в сторону уменьшения. Возрастает риск возникновения аварийных ситуаций. Это обуславливает необходимость разработки методов оценки и обеспечения безопасности нефтегазопроводов в условиях нестационарности технологических параметров эксплуатации. Цель исследования: оценить и обеспечить безопасность эксплуатации нефтегазопроводов при нестационарности технологических параметров перекачки. Объект исследования: трубопроводная система нефтегазовой отрасли. Методы: теоретические исследования безопасности эксплуатации нефтегазопроводов в условиях нестационарности технологических параметров режима перекачки. Результаты. Получены аналитические зависимости запасов прочности трубопроводов от параметров нестационарности режима перекачки. Даны рекомендации по обеспечению безопасности нефтегазопроводов в условиях нестационарности технологических параметров эксплуатации. Выводы. В условиях нестанционарности технологических параметров эксплуатации нефтегазопроводов в стенке их труб возникают повышенные механические напряжения, снижающие безопасность и ресурс сооружения. При одинаковых условиях нагружения внутренним давлением наибольшие напряжения возникают в сечениях соединения трубопровода с оборудованием, имеющим абсолютную жесткость на деформацию. Снижение уровня механических напряжений в стенке труб обеспечивается плавным регулированием режима перекачки, которое реализуется на нефтепроводах с помощью магистральных насосов, оснащенных частотно-регулируемым электроприводом. Обеспечение безопасности эксплуатации нефтегазопроводов в условиях нестационарности технологических параметров перекачки может быть достигнуто регулированием режима перекачки продукта перекачивающими агрегатами, оснащенными регулируемым приводом.The relevance. Changes in technological parameters of product pumping at oil and gas pipelines operation in comparison with the stationary operating condition leads to appearance of additional mechanical stresses in the wall of pipes and to decrease in margin of safety. At the same time, the pipeline resource specified in the project changes to decrease. The risk of failures increases. This substantiates the development of methods for assessing and ensuring the safety of oil and gas pipelines in conditions of non-stationarity of technological parameters of operation. The main aim of the research is to assess and ensure the safety of operation of oil and gas pipelines at non-stationarity of pumping technological parameters. Object: pipeline system of oil and gas industry. Methods: theoretical studies of oil and gas pipeline operation safety in conditions of non-stationarity of technological parameters of the pumping regime. Results. The authors have obtained the analytical dependences of pipelines strength on parameters of non-stationarity of pumping regime and recommended to ensure safety of oil and gas pipelines in conditions of non-stationarity of technological operating parameters. Conclusions. In non-stationarity conditions of oil and gas pipelines technological parameters while operation, in the wall of their pipes, the increased mechanical stresses occur that reduce the safety and life of the structure. Under the same conditions of inner pressure, the highest stresses arise in section of pipeline connection with equipment which have absolute rigidity for deformation. Mechanical stresses reduction in pipeline wall is provided by pumping regime smooth regulation, which is realized on oil pipelines by means of the main pumps, equipped with frequency-regulated electric drive. The pipelines exploitation safety in non-stationarity conditions of pumping technological parameters can be reached by pumping regime regulation with use of frequency-regulated electric drive
    corecore